The Java Sound API is a low-level API for effecting and controlling input and output of audio media. It provides explicit control over the capabilities commonly required for audio input and output in a framework that promotes extensibility and flexibility.
Because sound is so fundamental, Java Sound fulfills the needs of a wide range of customers. Potential application areas include:
Java Sound enables applications to extend audio support with specialized capabilities, and integrates well with architectures that provide higher-level interfaces and integration with other media types.
Java Sound provides the lowest level of audio support on the Java platform. It provides a high degree of control over audio-specific functionality. For example, it provides mechanisms for installing, accessing, and manipulating system resources such as digital audio and MIDI (Musical Instrument Digital Interface) devices. It does not include sophisticated sound editors and GUI tools; rather, it provides a set of capabilities upon which such applications can be built. It emphasizes low-level control beyond that commonly expected by the end user, who benefits from higher-level interfaces built on top of Java Sound.
Note: Throughout this document, the word "application" refers generically to Java applets as well as Java applications.
The Java Sound API includes support for both digital audio and MIDI data. These two major modules of functionality are provided in separate packages:
javax.sound.sampled
This package specifies interfaces for capture, mixing, and playback of digital
(sampled) audio.
javax.sound.midi
This package provides interfaces for MIDI synthesis, sequencing, and event
transport.
Two other packages permit service providers (as opposed to application developers) to create custom components that can be installed on the system:
The next section of this document discusses the sampled-audio system, including an overview of the javax.sound.sampled API. The final section covers the MIDI system and the javax.sound.midi API.
The javax.sound.sampled package handles digital audio data, also referred to as sampled audio. ("Samples" refer to successive snapshots of a signal, which in the case of digital audio is a sound wave. For example, the audio recorded for storage on compact discs is sampled 44100 times per second. Typically, sampled audio comes from a sound recording, but the sound could instead be synthetically generated. The term "sampled audio" refers to the type of data, not its origin. Sampled audio can be thought of as the sound itself, whereas MIDI data can be thought of as a recipe for creating musical sound.)
Java Sound does not assume a specific audio hardware configuration; it is
designed to allow different sorts of audio components to be installed on a system and
accessed by the API. Java Sound
supports common functionality such as input and output from a sound card (for
example, for recording and playback of sound files) as well as mixing of
multiple streams of audio. Here is one example of a typical audio architecture
for which Java Sound might be used:
In this example, a device such as a sound card has various input and output ports, and mixing is provided in software. The MIDI synthesizer shown as one of the mixers audio inputs might also be a feature of the sound card, or it might be implemented in software. (The javax.sound.midi package, discussed later, supplies a Java interface for synthesizers.)
The major concepts used in the javax.sound.sampled package are described in the sections below.
A line is an element of the digital audio "pipeline," such as an audio input or output port, a mixer, or an audio data path into or out of a mixer. The audio data flowing through a line can be mono or multichannel (for example, stereo). Each type of line will be described shortly, but first some of their functional relationships will be illustrated, showing the flow of audio through the "pipeline." The following diagram shows different types of lines in a simple audio-output system:
In this example, an application has asked a mixer for one or more available clips and source data lines. A clip is a mixer input into which you can load audio data prior to playback; a source data line is a mixer input that accepts a real-time stream of audio data. The application preloads audio data from a sound file into the clips, and then pushes other audio data into the source data lines. The mixer reads data from these lines, each of which may have its own reverberation, gain, and pan controls, and uses the reverb settings to mix the "dry" audio signals with the reverberated ("wet") mix. The mixer delivers its final output to one or more output ports, such as a speaker, a headphone jack, and a line-out jack.
Although the various lines are depicted as separate rectangles in the diagram, they are all "owned" by the mixer, and can be considered integral parts of the mixer. The reverb, gain, and pan rectangles represent processing controls (rather than lines) that can be applied by the mixer to data flowing through the lines. (Note that this is just one example of a possible audio system that is supported by the API. Not all audio configurations will have all the features illustrated. An individual source data line might not support panning, a mixer might not implement reverb, and so on.)
A simple audio-input system might be similar:
Here, data flows in to the mixer from one or more input ports, commonly the microphone or the line-in jack. Gain and pan are applied, and the mixer delivers the captured data to an application via the mixer's target data line. A target data line is a mixer output, containing the mixture of the streamed input sounds. The simplest mixer has just one target data line, but some mixers can deliver captured data to multiple target data lines simultaneously.
The different types of line will now be examined more closely. Several types of line are defined by subinterfaces of the basicLine
interface. The interface hierarchy is shown below.
[D]
The base interface, Line
, describes the minimal functionality common to all
lines:
Line.Info
) that indicates what
mixer (if any) sends its mixed audio data as output directly to the line, and
what mixer (if any) gets audio data as input directly from the line.
Subinterfaces of Line
may have corresponding subclasses of Line.Info
that
provide other kinds of information specific to the particular types of line.
Closing a line indicates that any resources used by the line may now be released. To free up resources, applications should close lines whenever they are not in use, and must close all opened lines when exiting.
Mixers are assumed to be shared system resources, and can be opened and closed repeatedly. Other lines may or may not support re-opening once they have been closed. Mechanisms for opening lines vary with the different sub-types and are documented where they are defined.
LineEvent
class. Two
types of LineEvent
are OPEN
and CLOSE
, but subinterfaces of
Line
can introduce other types of events.
OPEN
or CLOSE
event, the
event is sent to all objects
that have registered to "listen" for events on that line. Such objects
must implement the LineListener
interface.
An application can create these objects, register them to listen for line events, and
react to the events as desired.Ports
are simple lines for input or output of audio to or
from audio devices. The Port
interface has an inner class,
Port.Info
, that specifies the type of port. Some common types
are the microphone, line input, CD-ROM drive, speaker, headphone, and line output.
The Mixer
interface represents a hardware or software device that has
one or more input lines and one or more output lines. This definition means
that a mixer need not actually mix data; it might have only a single input.
The Mixer
API is intended to encompass a variety of devices, but the typical
case supports mixing.
The Mixer
interface provides methods for obtaining a mixer's lines. These can
include target data lines, from which an application can read captured audio
data, source data lines, to which an application can write audio data for
playback (rendering), and clips, in which an application can preload sound data
for playback. The mixer can lock these resources. For example, if the mixer
has only one target data line and it is already in use, an attempt by an
application to obtain a target data line will cause an exception to be
thrown.
You can query a mixer for lines of different types, by passing the
appropriate type of Line.Info
. You can also ask the mixer how many
lines of a particular type it supports.
A mixer maintains textual information about its specific device type
in an inner class called Mixer.Info
. This information include
the product's name, version, and vendor, along with a textual description.
Notice that the generic Line
interface does not provide a means to start
and stop playback or recording. For that you need a data line. The
DataLine
interface supplies the following additional media-related
features beyond those of a Line
:
AudioFormat
) specifies
the arrangement of the bytes in the audio stream. Some of the format's
properties are the number of channels, the sample rate, the sample size, and
the encoding technique. Common encoding techniques include linear pulse-code
modulation (PCM), mu-law encoding, and a-law encoding.
START
and STOP
events are produced when active
presentation or capture of
data from or to the data line starts or stops. An application can obtain a data line from a mixer. If the data line cannot be allocated because of resource constraints (for example, if the mixer supports only one target data line and it is already in use), an exception is thrown.
A TargetDataLine
receives audio data from a mixer.
Commonly, the mixer has captured audio data from
a port such as a microphone; it might process or mix this captured audio before
placing the data in the target data line's buffer. The TargetDataLine
interface provides methods
for reading the data from the target data line's buffer and for
determining how much data is currently available for reading. If an
application attempts to read more data than is available, the read method
blocks until the requested amount of data is available. This applies even if
the amount of data requested is greater than the
line's buffer size. The read method returns if the line is
closed, paused, or flushed.
Applications recording audio should read data from the target data line fast
enough to avoid overflow of the buffer, which results in discontinuities in
the captured data. If the buffer does overflow, the oldest queued data is
discarded and replaced by new data.
A SourceDataLine
receives audio data for playback. It provides
methods for writing data to the source data line's buffer for playback, and for
determining how much data the line is prepared to receive without blocking.
If an application attempts to write more data than is available, the read
method blocks until the requested amount of data can be written. This applies
even if the amount of data requested is greater than the line's buffer size.
The write method also returns if the line is closed, paused, or flushed.
Applications playing audio should write data to the source data line fast
enough to avoid underflow (emptying) of the buffer, which may result in
discontinuities in audio playback. If audio playback stops due to underflow, a
STOP
event is generated. A START
event is generated
when presentation resumes.
A Clip
is a data line into which audio data can be loaded prior
to playback. Because the data is pre-loaded rather than streamed, the
clips duration is known before playback, and you can choose any starting
position in the media. Clips can be looped, meaning that upon playback, all
the data between two specified loop points will repeat a specified number of
times, or indefinitely.
GroupLine
is a synchronized group of data lines. If a
mixer supports group lines, you can specify which data lines should be
treated as a group. Then you can start, stop, or close all those
data lines by sending a single message to the group, instead of
having to control each line individually.
Data lines and ports often
have a set of controls that affect the audio signal passing through the line.
The way in which the signal is affected depends on the type of control.
The Java Sound API defines the following subclasses of Control
:
GainControl
GainControl
object can be queried for its resolution and for
the minimum and maximum gain values it permits.
The resolution is expressed as the number of increments over which the range of possible
values is distributed.
PanControl
ReverbControl
SampleRateControl
GainControl
, the change can be made
gradually instead of immediately,
and the control can be queried for its resolution and its minimum and maximum
possible values.
Programmatically, you obtain a particular control object from a line through a reference to the controls class. You can also obtain an array of all the controls for that line.
AudioSystem
class serves as an application's entry point for
accessing the installed sampled-audio resources.
You can query the AudioSystem to learn what sorts of audio components
have been installed, and then you can obtain access to them.
For example, an application
might start out by asking the AudioSystem
class whether there is
a mixer that has a certain configuration, such as one of the input or output
configurations illustrated earlier in the discussion of lines. From the mixer,
the application would then obtain data lines, and so on.
Here are some of the resources an application can obtain from the
AudioSystem
:
Mixer.Info
object. You can
learn what mixers are available by invoking the getMixerInfo
method,
which returns an array of Mixer.Info
objects.
AudioSystem
, without dealing explicitly with mixers.
An application can use format conversions to translate audio data
from one format to another. (See the discussion of AudioFormat above.)
Format conversions are often used to compress and decompress audio data.
An application can query the AudioSystem
to learn what translations
are supported. It can then pass the AudioSystem
a stream of audio data
and get back a translated stream in a particular format.
An audio stream is an input stream with a specified audio data format
(AudioFormat
) and data length. The AudioInputStream
class represents such a stream, from which you can read bytes. Some audio input streams
permit you to remember positions in the stream and skip around in it.
The AudioSystem
class provides methods for translating between
audio files and audio streams. The AudioSystem
can also report the file format of a sound file and can write
files in the different formats. A file format is represented by the
AudioFileFormat
class. An AudioFileFormat
includes an
AudioFormat
, the file's length, and its type (WAV, AIFF, AU, etc.).
Service provider interfaces for the sampled audio system are defined in
the javax.sound.sampled.spi
package. Service providers can extend
the classes defined here so that their own audio devices, sound file parsers
and writers, and format converters can be installed and made available by
a Java Sound implementation.
Interfaces describing MIDI event transport, synthesis, and sequencing are
defined in the javax.sound.midi
package. The major
concepts used in the package are described in the sections below.
The diagram below illustrates the functional relationships between the
major components in a typical Java Sound MIDI configuration. (Java
Sound permits a variety of devices to be installed and interconnected. The system shown
here is just one possible scenario.) The
flow of data between components is indicated by arrows. The data can be
in a standard file format, or (as indicated by the key in the lower right
corner of the diagram), it can be audio, raw MIDI bytes, or Java Sound's
MidiEvent
objects.
In this example, the application prepares a musical performance by
loading a musical score that is stored as a Standard MIDI File on a disk
(lower left corner of the diagram). Standard MIDI files contain tracks,
each of which is a list of time-tagged MIDI events. This MIDI file is
read into a Sequence
object, whose data structure reflects
the file. A Sequence
contains a set of Track
objects, each of which contains a set of MidiEvent
objects.
The Sequence
is then "performed" by a Sequencer
.
A Sequencer
performs its
music by sending MidiEvents
to some other device, such
as an internal or external synthesizer.
As illustrated, MidiEvents
must
be translated into raw (non-time-tagged) MIDI before being sent through a MIDI
output port to an external synthesizer. This conversion is accomplished by a a MIDI-output device
called a StreamGenerator
. Similarly,
raw MIDI data coming into the computer from an external MIDI source is
translated into MidiEvents
by a StreamParser
.
The internal synthesizer (the rectangle marked "Synthesizer" in the diagram)
accepts MidiEvents
directly from the Sequencer
or
StreamParser
. It parses each event and usually dispatches a corresponding
command (such as noteOn
) to one of its MidiChannel
objects,
according to the MIDI channel number specified in the event.
(The MIDI Specification calls for 16 MIDI channels, so a Synthesizer
typically has 16 MidiChannel
objects.)
The MidiChannel
uses the note information in these messages
to synthesize music. For example, a noteOn
message specifies the note's
pitch and "velocity" (volume).
However, the note information is insufficient; the synthesizer also
requires precise instructions on how to create the audio signal for each note. These
instructions are represented by an Instrument
.
Each Instrument
typically emulates a different real-world musical
instrument or sound effect. The Instruments
might come as presets with
the synthesizer, or they might be loaded from soundbank files. In the synthesizer,
the Instruments
are arranged by bank number (the rows in the diagram) and program
number (the columns). An Instrument
can make use of stored digital audio, included
as Sample
objects in the soundbank. For example, to play the sound of a trumpet
playing a 5-second-long note, the synthesizer might loop (cycle) through a half-second snippet of a
recording of a trumpet.
Now that the components have been introduced from a functional perspective, we will take a brief look at the API from a programmatic perspective.
A MidiEvent
object specifies the type, data length,
and status byte of the raw MIDI message for which it serves as a wrapper. In addition, it
provides a tick value that is used by devices involved in MIDI timing, such as sequencers.
There are three categories of events, each represented by a MidiEvent
subclass:
ShortEvents
are the most common and have at most two data bytes following the status byte. SysexEvents
contain system-exclusive MIDI messages. They may have many bytes,
and generally contain manufacturer-specific instructions.MetaEvents
occur in MIDI files, but not in raw MIDI data streams. Meta events
contain data, such as lyrics or
tempo settings, that might be useful to sequencers but are
usually meaningless for synthesizers.The base interface for devices is MidiDevice
. All
devices provide methods for listing the set of MIDI modes that they support, and for
querying and setting the current mode. (The mode is a combination of MIDI's
Omni mode and Mono/Poly mode.) Devices can be opened and closed,
and they provide descriptions of themselves through a MidiDevice.Info
object.
The following diagram illustrates the MidiDevice
interface hierarchy. Also depicted are
two classes, connected by dashed lines to the MidiDevice
interfaces they implement.
Devices are generally either transmitters or receivers of MidiEvents
.
The Transmitter
subinterface of MidiDevice
includes methods for setting and querying the
receivers to which the transmitter is sending MidiEvents
. From the perspective of
a transmitter, these receivers fall into two categories: MIDI Out and MIDI Thru.
The transmitter sends events that it generates itself to its
MIDI Out receivers. If the transmitter is itself also a receiver, it passes along events
that it has received from elsewhere to its MIDI Thru receivers.
The Receiver
subinterface of MidiDevice
consists of a single method for receiving MidiEvents
. Typically, this method is invoked
by a Transmitter
.
Java Sound includes concrete classes for converting between MidiEvent
objects
and the raw byte stream used in MIDI wire protocol. A
StreamGenerator
is a Receiver
that accepts
MidiEvent
objects from a Transmitter
and writes out a
raw MIDI byte stream. Similarly, a
StreamParser
is a Transmitter
that
accepts a raw MIDI byte stream and writes the corresponding
MidiEvent
objects to its Receiver
.
A Synthesizer
is a type of MidiDevice
that generates sound. The Synthesizer
interface extends both Receiver
and
Transmitter
.
It provides methods for manipulating soundbanks
and instruments. In addition, it provides access to a
set of MIDI channels through which sound is actually produced. A Synthesizer
receives MidiEvents
and invokes corresponding MidiChannel
messages.
MidiChannels
have methods representing the common MIDI voice messages
such as "note on" and "control change." They also permit queries of the
channel's current state.
Like Synthesizer
, the Sequencer
interface extends
both Transmitter
and Receiver
(and therefore
MidiDevice
).
Sequencer
adds methods for basic MIDI sequencing operations. A sequencer can load and
play back a sequence, query and set the tempo, and control the master and
slave sync modes. An application can register to be notified when the
sequencer processes MetaEvents
and controller events. (A controller event occurs when
a MIDI controller, such as a pitch-bend wheel or a data slider, changes its value. These
events are not MidiEvents
, but are created when the Sequencer
encounters
certain ShortEvents
in the Sequence
.)
The Sequence
object represents a MIDI sequence as one or
more tracks and associated timing information. A track contains a list of
time-stamped MIDI events. Sequences can be read from MIDI files, or created from scratch
and edited by adding Tracks
to the Sequence
(or removing them).
Similarly, MidiEvents
can be added to or removed from the Tracks
.
Tracks
to the Sequence
(or removing them).
Similarly, MidiEvents
can be added to or removed from the Tracks
.
It is not necessary to load a MIDI file into a Sequence
object before
playing the file. The setSequence(java.io.InputStream)
method of
Sequencer
lets you read a MIDI file directly into a Sequencer
, without
creating a Sequence
object first.
MidiSystem
acts as an application's entry point to the MIDI music system. It
provides information about, and access to,
the set of installed devices, including transmitters, receivers, synthesizers,
and sequencers.
The MidiSystem
class provides methods for reading MIDI files to create
Sequence
objects, and for writing Sequences
to MIDI files. A MIDI Type 0 file contains only one track, while a Type 1
file may contain any number. MidiSystem
also provides methods to create Soundbank
objects by parsing soundbank
files.
Configuration of the MIDI system is handled in the
javax.sound.midi.spi package
. The abstract classes in this package
allow service providers to supply and install their own MIDI devices, MIDI file
readers and writers, and soundbank file readers.
Copyright © 1999 Sun Microsystems, Inc. All Rights Reserved. Please send comments to: mailto:javasound-comments@javamedia.eng.sun.com |
Java Software |